SITE LINK

KMID : 1022420150070010003

Phonetics and Speech Sciences
2015 Volume.7 No. 1 p.3 ~ p.10

Evaluation of Frequency Warping Based Features and Spectro-Temporal Features for Speaker Recognition

Choi Young-Ho

Ban Sung-Min
Kim Kyung-Wha
Kim Hyung-Soon

Abstract

In this paper, different frequency scales in cepstral feature extraction are evaluated for the text-independent speaker recognition. To this end, mel-frequency cepstral coefficients (MFCCs), linear frequency cepstral coefficients (LFCCs), and
bilinear warped frequency cepstral coefficients (BWFCCs) are applied to the speaker recognition experiment. In addition, the spectro-temporal features extracted by the cepstral-time matrix (CTM) are examined as an alternative to the delta and
delta-delta features. Experiments on the NIST speaker recognition evaluation (SRE) 2004 task are carried out using the
Gaussian mixture model-universal background model (GMM-UBM) method and the joint factor analysis (JFA) method, both based on the ALIZE 3.0 toolkit. Experimental results using both the methods show that BWFCC with appropriate warping factor yields better performance than MFCC and LFCC. It is also shown that the feature set including the spectro-temporal information based on the CTM outperforms the conventional feature set including the delta and delta-delta features.

KEYWORD

speaker recognition, GMM-UBM, JFA, MFCC, LFCC, BWFCC, delta feature, cepstral-time matrix

FullTexts / Linksout information

Listed journal information

site infomation

Prohibition of Unauthorized Collection of E-mail Addresses, medric.kyung@gmail.com
N4 301, Chungbuk National University, Chungdae-ro 1, Seowon-Gu, Cheongju, Chungbuk 28644, Korea